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Abstract The diffusion based distributed learning approach¬ 
es have been found to be a viable solution for learning over 
linearly separable datasets over a network. However, ap¬ 
proaches till date are suitable for linearly separable datasets 
and need to be extended to scenarios in which we need to 
learn a non-linearity. In such scenarios, the recently pro¬ 
posed diffusion kernel least mean squares (KLMS) has been 
found to be performing better than diffusion least mean 
squares (LMS). The drawback of diffusion KLMS is that 
it requires infinite storage for observations (also called dic¬ 
tionary). This paper formulates the diffusion KLMS in a 
fixed budget setting such that the storage requirement is cur¬ 
tailed while maintaining appreciable performance in terms 
of convergence. Simulations have been carried out to vali¬ 
date the two newly proposed algorithms named as quantised 
diffusion KLMS (QDKLMS) and fixed budget diffusion 
KLMS (FBDKLMS) against KLMS, which indicate that 
both the proposed algorithms deliver better performance as 
compared to the KLMS while reducing the dictionary size 
storage requirement. 
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1 Introduction 


In this age, we are flooded with huge volume of data based 
on which we aim to do inference. With the development 
of technologies |01, huge datasets distributed over networks 
have come into existence. There are two popular ways of 
processing this deluge of information, namely; a) Centralised 
processing, and b) Distributed processing. In centralised 
processing there is a central node which handles all the 
inference mechanism in the network via a central fusion 
center. This process has two major drawbacks: a) the cen¬ 
tral node has high computational requirement with scaling 
the network, and b) if the fusion center fails the entire cen¬ 
tralised adaptation falls apart. To cater to these challenges, a 
new branch of adaptive Altering called diffusion distributed 
adaptive Altering 32], has come into the scene of in¬ 

formation processing. Examples of such scenarios occur in 
wireless sensor networks (see references in 02811 '). The nodes 
rely on observations emanating from their neighbour nodes, 
hence reducing the per-node computational cost and at the 
same time delivering equivalent performance as compared 
to centralised based approaches. 


1.1 Related Works 


Many techniques have been proposed for distributed adap¬ 
tive Altering in the existing literature. Incremental based ap¬ 
proaches ||3|2i[3l|27l|3l|Il[Il|2l|l[33,[illl|2l|2i] 
visit each node cyclically and adapt the linear adaptive Alter 
weights. However, it does not exploit all the data available to 
ytarticular node for inference. Diffusion based algorithms 
0 Hi 0, S, i E S 0 m n exploit the weighted local 
information from its neighbours and has been established 
as a viable solution for distributed adaptive Altering. This 
gives faster convergence as compared to incremental based 
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approaches at the cost of slightly higher computational com¬ 
plexity. 

Parallely, a new genre of adaptive filtering based on ker¬ 
nel adaptive filtering has become popular for non-linear in¬ 
ference. These algorithms are suited for scenarios when the 
incoming data is not linearly separable. Popular algorithms 
like least mean squares (LMS), affine projection algorithm 
(APA), recursive least squares have been mapped to repro¬ 
ducing kernel Hilbert space (RKHS) by the “kernel trick”. 
Most popular among them are kernel least mean squares 
(KLMS) lEHl . kernel recursive least squares (KRLS) 114|, 
kernel affine projection algorithms (KAPA) 12 ll . These ker¬ 
nel adaptive filtering techniques theoretically require infi¬ 
nite memory requirements. To counter this shortcoming, al¬ 
gorithms like quantised KLMS and fixed budget quantised 
KLMS 1 I 37 I Inn have been proposed which adaptively re¬ 
stricts the size of the working dictionary. 

The work in lE^ . called diffusion-KLMS combines the 
domains of kernel adaptive filtering with distributed diffu¬ 
sion adaptive filtering. However, like KLMS each node in 
the network is assumed to have infinite storage capacity. 


1.2 Motivation and Contributions 

In this paper, we address the important issue of restricting 
the infinite dictionary-size requirement for diffusion-KLMS, 
at all nodes in the network to a finite value. To this end, we 
propose two algorithms in this paper: a) the quantised diffu¬ 
sion KLMS (QDKLMS), which uses an online vector quan¬ 
tisation approach to update the distributed dictionary and b) 
fixed budget diffusion KLMS (FBDKLMS), which gives a 
way of pruning the dictionary over the network. These pro¬ 
posed approaches, QDKLMS and FBQDKLMS bounds the 
per-node storage requirement for diffusion-KLMS for es¬ 
timating distributed non-linear hypotheses, and also paves 
the way for distributed non-linear hypothesis estimation in 
RKHS with a finite dictionary size for datasets in which a 
linear/affine separating boundary cannot be estimated. 


1.3 Paper Outline 

The outline of this paper is as follows: a) Sect. 2 reviews 
diffusion KLMS, b) Sect. 3 gives the QDKLMS, c) Sect. 
4 gives FBDKLMS, d) simulations are provided to validate 
our proposed approaches in Sect. 5 and conclusions of this 
paper are drawn in Sect. 6. 


2 Diffusion-KLMS 

Based on the KLMS algorithm, we review its distributed 
variant in this section based on the diffusion approach. We 


now define matrices and symbols that will be used in this 
paper. In this proposal, we have the matrix Y = [y/ „] to de¬ 
note output corresponding to the neighbour at time 
instant. E = [e/ „] is the error matrix corresponding to the 
neighbour at n'* time instant. X{n) = [{x/ „}] is a matrix of 
measurement vectors from neighbours of node q at time in¬ 
stant n stacked together. In the following few lines, we will 
denote the collection of the data from various nodes at the 
time instant as X{n).X (n) contains the data pertaining to 
all I neighbours stacked in row vector form. In case, there is 
no vector from a node in the neighbourhood it is replaced by 
the zero vector mX{n) and will have a corresponding 0 entry 
in C, which is a stochastic matrix of compatible dimension. 

Basically, the distributed version of diffusion KLMS can 
be formulated by considering the RBF analogy for KLMS in 
lEoll . with centers weighted by the innovations and consider¬ 
ing it to be consisting of two steps namely the diffusion and 
incremental step: 

n—l 

~ ^ ^ 0 (^/); 0 (^^,«) ^ ( 1 ) 

i=0 

where < •, • denoting the kernel inner product in RKHS. 
The {5^.,} are weight-factor estimates at node at instant 
i for RBF centers which happen to be the innovations ,■ at 
each node. Hence invoking the diffusion step (step-1) for the 
weighting factors similar to iQ], we get: 

~ 1 ( 2 ) 

i 

Using these estimates of innovation-weights 5^ „ for a given 
node q at the iteration, we use them for the incremental 
step (step-2) as follows: 

n—l ^ 

^q,n — ^ ^ ^ ^ ^ Jff ( 3 ) 

1=0 

where Qq^„ is the implicit learned parameter in RKHS. Ap¬ 
plying the kernel trick results in, 

n—\ 

yq,n+\ = J? L < <PiCX{i)),<l){X{n)) (4) 

i=0 

7 ] being the step-size. 

^qiyi,n ( 5 ) 

le^ti le^q 

where A is a stochastic matrix corresponding to the proba¬ 
bilistic weights {aqi] and jYq denotes the node indices of 
neighbourhood of q. The error at time instant at the q'^' 
node would be the (transformed) mean (by A) of e over all 
possible /. 

The proposed algorithm is given below, as iterating fol¬ 
lowing three steps, till convergence: 
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1. Estimate the outputs of node I using estimates of error 

2. Form an estimate of errors at time instant n at each node 
1. Let this be given by the vector e{n) whose element 
is Then the error term for the node for the 
time instant can be written as e/= di „ — yi „, where 
c/; „ is the desired value at node at time instant. 

3. The error at each node is modified by the transforma¬ 
tion A by the equation e (n-|-l)=Ae(n), where e (n) and 
e (n) are vectors of error terms corresponding to all the 
nodes (for all nodes indexed by /) stacked together. 


3 Proposed Algorithm I: Quantised Diffusion KLMS 
(QDKLMS) 

In this section, we propose a novel algorithm for distributed 
non-linear inference over a network with finite dictionary- 
size thereby lending it to efficient implementation. The pro¬ 
posal presented in this section curtails the infinite arbitrary 
growth of the dictionary containing the innovations and ob¬ 
servations. 

Let us introduce the notion of a dynamic dictionary and 

( i) 1^ I 

it is denoted by {^q,n}jJi (we denote as the cardinal¬ 
ity of set £/) for the node at the time instant, where 
[j) denotes the /* entry of the dictionary. This dictionary is 
filled with tuples of innovations, whose contents denoted by 
and corresponding observations ■ 

As we are dealing with non-linear estimation over a net¬ 
work, the local estimate of the implicit parameter Qq{n) for 
the node at the n'* time instant in the RKHS, is given as 
follows: 


^?(«) = J? L (6) 

./=i 

Fusing the observations by the matrix C from the neighbour¬ 
ing nodes, we get the following modified current observa¬ 
tion: 

^q,n ~ ^^^lq^l,n (^) 

V/ 

Taking kernel inner product on both sides on (|6| with x^„, 
we arrive at the following adaptation: 

yq,n = ^ ^ (•^q,'n ) t ^ i^q^n) ^ Jff ( 8 ) 

j=l 

Note that we have invoked the identity for Hilbert adjoint 
operator, < Tx,y >=< x, T*y > where. 


where <7 is the kernel spread parameter, a simulation para¬ 
meter determined by Silverman’s rule x and y are ar¬ 
bitrary elements from a Hilbert space endowed with inner 
product <•,•>. T is an operator and T* is its adjoint 0. 
In our case, T would be given by the Kronecker product of 
C with an identity matrix. 

The dictionary is updated based on an online vector 
quantisation approach similar to OTT . This controls the dic¬ 
tionary from growing unboundedly. The first proposed algo¬ 
rithm is formulated in Algorithm 1. 


Algorithm 1 Quantised Diffusion KLMS (QDKLMS) 

1: Initialise step-size rj, kernel width cr and quantisation threshold 
e > 0, and initial dictionary = {^Ao} for all nodes in the net¬ 
work of network size 
while I I > IVq do 
for g = I : |,/k| do 

yq,n = I'®T' 

^q,n — ^cf,n yq,n 


f = argmini<j<|^^„_,| ||x,,„ - 


if) 


if ||x,,„ - 

^q,n ~ ^q,n—l 

aif) - Af) I „ ' 

else 

^q^n ~ ^q^n—\ An = An—I U 

end if 


q,k 


end for 
end while 


The Algorithm 1 checks if the current observation is 
close to some observation in the dictionary. If yes, then the 
center of the corresponding observation in the dictionary is 
updated. Otherwise the new observation is appended to the 
dictionary. 

Thus we can see from the above Algorithm 1 that the dic¬ 
tionary size is updated if the received observation at a given 
node q is significantly “different” (with respect to Euclid¬ 
ean norm) from all the observations in q‘^ node’s dictionary. 
This prevents the dictionary size at a particular node from 
growing unboundedly over the network. However, this algo¬ 
rithm does not give a way to prune the dictionary at all net¬ 
work nodes so as to discard “unimportant” entries. The next 
algorithm, proposed in the following section provides us a 
method for online pruning the available dictionary, hence 
providing equivalent convergence with a much lesser stor¬ 
age requirement. 

4 Proposed Algorithm II: Fixed Budget Quantised 
Diffusion KLMS (FBQDKLMS) 


ii) ' —'^an\ 

< (l){Arq[i^),(l){x r,) >^= exp(- ’ 


This section gives a method to prune the size of the avail- 
(9) able dictionary by techniques given in OTII which are based 
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on online estimation of a term called “significance”. It is a 
measure of how much a particular entry of a dictionary ^ 
contributes to the overall learned hypothesis. Significance is 
estimated in the following manner depending on whether a 
new center is added, merged or pruned. In case a center is 


added, the significance, , for /" entry of the dictionary 


■th 


at n'" time instant and node is updated as follows Il37ll : 


( 10 ) 


r'\J) _ Z' I 

IV I < 

VI <;■ < IVn-il 


where is a forgetting factor such that 0 << < 1. In case 

of merging, we update as follows: 


^q,n — 


( 11 ) 


1 >.^ 


rU¥=f)\ ^ 
q,n 

is a variable which is updated as: 


1 U) _ r j (f) 

^q,n — 

and, 

rif) _ \^q!n + ^^q,n \ r c-(i*) 


p\J ) _ 


i4«i 




T7e^,„| < 

In case of deletion/pruning of the L'^' dictionary entry. 


( 12 ) 


(13) 


(14) 


r(;) I a (E) 


XJ) 






(j) 


(15) 


Using these distributed online estimates of significance, the 
proposed FBQDKLMS is given in Algorithmic 

In Algorithm 2 we have two independent measures of 
controlling the dictionary length. First, measure is the Eu¬ 
clidean proximity of an incoming observation to a member 
of dictionary. If there is a member in the dictionary which 
is in an £ neighborhood of the regressor, the member’s con¬ 
tents are updated. Otherwise the new observation is added 
to the dictionary. On the other hand, a term called signifi¬ 
cance introduced in OTT . which estimates the overall con¬ 
tribution of a member of the dictionary to the overall hy¬ 
pothesis learned, is estimated recursively. If the significance 
of a member is the least among all elements in the dictio¬ 
nary, then that particular member is deleted from the dictio¬ 
nary. Thus this makes the dictionary more flexible by giving 
mechanisms for both expanding and reducing a dictionary. 


Algorithm 2 Fixed Budget Quantised Diffusion KLMS 
(FBQDKLMS)_ 

1: Initialise step-size rj, kernel width a and quantisation threshold 
e > 0, and initial dictionary for all nodes in the net¬ 

work of network yK 

while > IVg do 
for 9 = 1 : I do 

' _ „ 

^ci,n ^ LJilejKi '^lq^q,n 


2 

3 

4 

5 

6 

7 

8 

9 

10 
11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 


yq,n = T] Vr'' < 

j* = argmini<j<|®^,__,| || 

if II < £ then 

fU*) _ AD _|_ „ ' 

^q,n — ^q,n-\ ^ 'l^q,n 

Update significance }V j as per eq. 

01} and eq. 02) 
else 

Update significance j as per eq. 

OB and eq. 03 

Also update significance for newly added tuple as per 

eq. 03- 

end if 

D-i ^ 6 ^9,«-iV 7 : = min{£:,^,4} 

“ ^q,n-\ 

Update significance {Eq)i }V j as per eq. 114b 
end for 
end while 


5 Simulations 

In this section, we provide simulations to validate the two 
newly proposed algorithms against the existing literature. 
We first discuss about the simulation setup used in this pa¬ 
per. We considered three simulation scenarios: a) the non¬ 
stationary channel from ISTT . b) crescent moon dataset i36l] 
and c) spiral dataset The scatter diagrams of the cres¬ 
cent moon and spiral datasets are given Fig. [T] and Fig. |2l 
For the proposed algorithms, network nodes are considered 
which are assumed to produce data randomly from above 
mentioned considered datasets. 

Subsequently, we discuss the simulation parameter val¬ 
ues. Throughout, kernel width value, determined by Silver¬ 
man’s rule, was used and step-size rj is fixed to 0.1 to com¬ 
pare all the algorithms. The fixed budget was varied depend¬ 
ing on the dataset such that both algorithms have similar 
steady-state dictionary size. 

Finally, we provide the convergence results in Fig. [2 
Fig. |4] and Fig. |5] In Fig. [3 we consider the non-stationary 
equalization channel in 1371] with binary input. It is observed 
that there is faster convergence in case of QDKLMS and 
FBQDKLMS as compared to KLMS with a single node. It 
is also observed that QDKLMS and LBQDKLMS exhibit 
similar transient behaviour with LBQDKLMS being tenden- 
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Fig. 1 Scatter diagram for synthetic dataset b) 

tious to smaller dictionary-sizes. In Fig.|4]and Fig.|5] we find 
that the proposed approaches QDKLMS and FBQDKLMS 
converge faster than KLMS, with FBQDKLMS having su¬ 
perior convergence as compared to QDKLMS with similar 
or lower storage requirement. 

To see how the two proposed algorithms compare against 
each other whilst we scale the network-size we present sim¬ 
ulations in Fig. 12 Fig. Q and Fig. |2 All outputs of these 
simulations have been averaged over 200 monte-carlo itera¬ 
tions. 

In Fig. 12 we consider the non-stationary equalisation 
problem. We find that both the proposed algorithms exhibit 
similar decreasing trend of converged MSB floor as we scale 
the network size. We also see that the FBQDKLMS con¬ 
verges to a lower dictionary size requirement as compared 
to QDKLMS for smaller number of nodes. 

In Fig.|2]and Fig. [2 we compare the proposed algorithms 
on the crescent moon dataset. We find that the FBQDKLMS 
converges to a lower erTor floor as we scale the network size 
and converges to lower dictionary-size with respect to num¬ 
ber of nodes in the inference network. 

These simulations indicate that given a network with ob¬ 
servations emanating for nodes equipped with finite storage, 
QDKLMS and FBQDKLMS algorithms are robust and are 
also flexible for the assumed system models. 



6 Conclusion and Future Work 

In this paper two distributed kernel adaptive filtering algo¬ 
rithms, namely the quantised diffusion KLMS and the fixed 
budget quantised diffusion KLMS were introduced which 
work with a limited dictionary across all the network nodes. 
These algorithms have been found by simulations to be con¬ 
verging faster than other existing distributed adaptive filter¬ 
ing algorithms available in the literature. Also, the proposed 
algorithms need lesser memory and processing power re¬ 
quirement as they work with a finite dictionary at all nodes, 
and hence can be implemented in a practical system. Among 
the proposed algorithms, the FBQDKLMS is more preferred 
as its performance has been found to be comparable to QD¬ 
KLMS and has been found to be tendentious to lower dictio¬ 
nary sizes. This work can be extended for applications like 
wireless sensor networks, distributed massive multiple input 
multiple output (MIMO) for 5G applications and distributed 
spectrum sensing in cognitive radio. 
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Fig. 6 Variation of MSB floor and converged network-size with increase of number of nodes for non stationary equalisation. 
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Fig. 7 Variation of MSB floor and converged network-size with increase of number of nodes for crescent moon dataset b). 
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Fig. 8 Variation of MSB floor and converged network-size with increase of number of nodes for spiral dataset c). 
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